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Abstract —  In  the  recent  years,  intelligent  data-driven  fault  diagnosis  methods  on  gearboxes  have  been 
successfully  developed  and  popularly  applied  in  the  industries.  Currently,  most  of  the  machine  learning 
techniques  require  that  the  training  and  testing  data  are  from  the  same  distribution.  However,  this 
assumption  is  difficult  to  be  met  in  the  real  industries,  since  the  gearbox  operating  conditions  usually 
change  in  practice,  which  results  in  significant  data  distribution  gap  and  diagnostic  performance 
deteriorations  in  applying  the  learned  knowledge  on  the  new  conditions.  This  paper  proposes  a  deep 
learning-based  domain  adaptation  method  to  address  this  issue.  The  raw  current  signals  are  directly  used 
as  the  model  inputs  for  diagnostics,  which  are  easy  to  collect  in  the  real  industries  and  facilitate  practical 
applications.  The  maximum  mean  discrepancy  metric  is  introduced  to  the  deep  neural  network,  the 
optimization  of  which  guarantees  the  extraction  of  generalized  machinery  health  condition  features  across 
different  operating  conditions.  The  experiments  on  a  real-world  gearbox  condition  monitoring  dataset 
validate  the  effectiveness  of  the  proposed  method,  which  offers  a  promising  tool  for  cross-domain  diagnosis 
in  the  real  industries. 
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I.  INTRODUCTION 

In  the  past  decades,  rotating  machines  have  been 
widely  used  in  a  large  number  of  industries,  such  as 
manufacturing,  aero-space  industry,  automotive  etc. 
Gearbox  is  one  of  the  key  components  in  rotating  machines 
for  delivering  torque  and  offering  speed  conversions. 
Effective  and  timely  fault  diagnosis  of  gearbox  is  of  great 
importance  in  the  real  industries,  which  can  optimize 
maintenance  schedule,  enhance  operational  safety  and 
reduce  economic  costs  [1].  Traditionally,  many  model- 
based  signal  processing  methods  have  been  used  for  the 
fault  signal  analysis  of  gearbox  [2].  While  effective 
diagnosis  results  have  been  obtained,  the  model-based 
approaches  generally  rely  on  good  expert  knowledge,  and 
require  much  human  labor  on  model  development. 
Therefore,  they  are  less  efficient  for  applications  in  the  real 
industrial  scenarios.  Moreover,  smart  manufacturing 
initiative  has  established  a  consistent  method  for  data 
access  across  different  enterprises  helping  predictive 
manufacturing  and  fault  diagnosis  to  advance  in  a  rapid 
pace  [3,4].  In  general,  high  diagnosis  accuracy  and  fast 
implementation  can  be  achieved  [5].  Furthermore,  little 
prior  expertise  on  signal  processing  and  dynamics  model  of 


gearbox  is  generally  required,  which  largely  facilitates  the 
industrial  applications.  In  the  literature,  the  popular  data- 
driven  methods  include  artificial  neural  networks  (ANN), 
random  forest  (RF),  support  vector  machines  (SVM)  and 
so  forth.  Recently,  deep  learning  has  been  emerging  as  a 
highly  effective  algorithm  for  data  processing,  which  is 
promising  to  further  improve  the  performance  of  the 
existing  data-driven  approaches  [6].  Basically,  the  deep 
learning  methods  are  capable  of  efficiently  capturing  the 
underlying  relationship  between  input  and  output  data, 
through  multiple  linear  and  non-linear  data  transformations 
[7]. Specifically,  with  respect  to  fault  diagnosis  problems, 
the  machinery  health  states  can  be  well  predicted  using  the 
collected  condition  monitoring  data,  despite  the  high 
dimensions  of  thesignals  [8,9] .Authors  in  [10]  proposed 
using  convolutional  neural  network  (CNN)  for  gearbox 
fault  diagnosis  and  achieved  a  significantly  better 
classification  accuracy  compared  to  the  classical  ma-chine 
learning  methods.  A  fault  diagnosis  method  for  wind 
turbine  gearbox  based  on  stacked  auto-encoder  and  multi¬ 
class  SVM  was  proposed  in  [11].  A  Deep  Belief  Network 
fault  diagnosis  method  based  on  manually  extracted  time 
and  frequency  domain  features  was  proposed  in3for 
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gearbox  and  bearing  applications.  These  studies  emphasize 
the  significant  improvement  in  gearbox  fault  diagnosis 
performance  by  using  deep  learning  based  methods 
compared  to  the  conventional  data-driven  methodologies. 
It  should  be  pointed  out  that  while  promising  diagnosis 
performance  has  been  obtained  using  deep  learning,  the 
main  assumption  lies  in  that  the  training  and  testing  data 
are  supposed  to  be  from  the  same  distribution.  That  means 
the  labeled  training  data  and  unlabeled  testing  data  should 
be  collected  in  the  similar  operating  conditions  of  gearbox. 
However,  the  working  scenarios  such  as  load,  rotating 
speeds  etc.  usually  change  in  different  practical  industrial 
tasks.  That  results  in  significant  distribution  discrepancy 
between  training  and  testing  data,  which  deteriorates  the 
data-driven  model  generalization  performance  [12]  .In 
order  to  address  this  problem,  transfer  learning  algorithms 
have  been  proposed  in  the  recent  years  [10,13],  which  aim 
to  generalize  the  data-driven  knowledge  learned  from  the 
training  condition,  denoted  as  source  domain,  to  the  testing 
condition,  denoted  as  target  domain.  Specifically,  the 
domain  adaptation  (DA)  techniques  have  been  popularly 
developed  in  the  fault  diagnosis  field  [14,15],  which 
assume  the  training  and  testing  data  share  the  same  label 
space.  That  is  consistent  with  the  machinery  health 
condition  identification  problems  [16].  The  domain- 
invariant  features  across  different  conditions  are  expected 
to  be  learned  with  the  domain  adaptation  methods,  and 
stronger  model  generalization  ability  can  be  achieved.  A 
framework  for  gearbox  domain  adaptation  was  proposed  in 
[17]  based  on  deep  neural  network,  where  only  the  source 
domain  data  and  healthy  data  from  the  target  domain  were 
used  to  accomplish  the  DA  tasks.  In  [18],  a  DA  approach 
for  fault  diagnosis  of  low-speed  bearing  was  proposed.  The 
authors  used  acoustic  spectral  imaging  technique  to  convert 
time-domain  acoustic  emission  signal  to  representative 
images  for  different  health  conditions.  These  images  were 
used  in  a  DA  model  for  predicting  labels  of  target  domain 
dataset.  A  deep  CNN-based  DA  method  for  gearbox  fault 
diagnosis  was  proposed  in  [19]  based  on  vibration  signal. 
In  their  approach,  the  raw  time-domain  vibration  signal 
was  converted  to  gray-scale  images  and  used  as  input  to 
the  CNN  model.  The  authors  firstly  trained  a  CNN  model 
on  the  source  dataset  and  then  fine-tuned  it  using  the  target 
domain  samples.  In  general,  the  deep  learning-based 
domain  adaptation  methods  have  shown  great  potential  in 
bridging  the  domain  gap  in  different  working  conditions 
[20, 21]. In  the  current  literature,  the  machinery  vibration 
data  are  mostly  focused  for  fault  diagnosis  [22],  since  the 
vibration  signal  is  representative  of  the  behavior  of 
periodic  events  in  the  gearbox  and  it  is  expected  the 
behavior  of  the  gearbox  would  change  in  case  of  any  kind 


of  mechanical  abnormality.  For  different  kinds  of  signals, 
the  application  of  the  torque  measurement  has  been  seldom 
investigated.  The  authors  in  [23]  discussed  torsional 
vibration  analysis  as  a  potential  approach  for  fault 
diagnosis  in  fixed  shaft  gearboxes.  Using  torque  signal  in 
fault  diagnosis  of  planetary  gearboxes  was  discussed  in 
[24]  and  the  authors  proposed  a  diagnosis  method  based  on 
the  demodulated  spectra  of  amplitude  envelope  and 
instantaneous  frequency.  The  study  by  Qiaoet  al.  [25]  on 
wind  turbine  mechanical  components  pointed  out  the 
usefulness  of  the  torque  signal  in  detecting  gearbox  faults. 
Furthermore,  Mohanty  et  al.  [26]  stated  that  the  current 
signal  of  the  induction  motor  driving  the  gearbox  is  useful 
for  the  fault  diagnosis  investigations,  and  the  motor  current 
signature  analysis  (MCSA)  can  be  largely  improved  using 
the  proposed  demodulation  method.  The  effectiveness  of 
MCSA  in  rotating  machinery  fault  diagnosis  problems  was 
also  validated  in  [27,28].  Therefore,  it  is  feasible  and 
promising  to  explore  the  current  signals  for  gearbox  health 
identification,  which  are  easy  to  collect  in  the  real 
industries.  However,  it  should  be  pointed  out  that  the 
existing  methods  are  mostly  complicated  and  require 
sophisticated  domain  knowledge  on  gearbox  modeling  and 
signal  processing  skills,  which  are  difficult  to  be 
implemented  in  different  applications.  This  paper  proposes 
a  deep  learning-based  domain  adaptation  method  for  the 
gearbox  fault  diagnosis.  An  end-to-end  diagnostic 
framework  is  built,  which  takes  the  raw  collected  data  as 
input  and  directly  outputs  the  results.  The  current  signals 
are  investigated  in  this  study,  which  are  generally  easier  to 
collect  than  the  popular  vibration  data  in  the  real  industrial 
scenarios.  The  maximum  mean  discrepancy  metrics 
introduced  to  measure  and  minimize  the  data  distribution 
distance  between  different  domains,  and  the  generalized 
diagnostic  features  of  different  machinery  health  condition 
scan  be  extracted.  Experiments  on  real-world  gearbox 
datasets  are  implemented  for  validations,  and  the  proposed 
method  is  capable  of  effectively  diagnosing  gearbox  faults 
across  different  operating  scenarios.  The  remainder  of  this 
paper  starts  with  the  preliminaries  in  Section  II.  The 
proposed  fault  diagnosis  method  is  shown  in  Section  III, 
and  experimentally  validated  and  investigated  in  Section 
IV.  We  close  the  paper  with  conclusions  in  Section  V. 
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II.  PRELIMINARIES 

A.  Deep  Convolutional  Neural  Network 

In  the  past  years,  deep  learning  also  denoted  as  deep 
neural  network  has  achieved  great  success  in  different 
applications.  Besides  the  basic  multi-layer  perceptron 
(MLP)  structure,  the  convolutional  neural  network  (CNN) 
architecture  has  been  more  efficient  on  feature  extraction 
and  the  high-dimensional  machinery  data  can  be  well 
processed  [7].  Basically,  multiple  convolutional  layers  are 
stacked  in  the  CNN  structure  to  model  the  relationship 
between  input  and  output.  Specifically,  the  one¬ 
dimensional  CNN  is  adopted  in  this  study,  which  is  well 
suited  to  process  the  measurement  signals  of  gearboxes. 
Together  with  convolutional  operations,  pooling  is  usually 
implemented  after  the  convolutional  layers.  The  averaging¬ 
pooling  and  max-pooling  operations  are  popularly  adopted, 
which  are  able  to  learn  the  average  and  maximum  values 
from  the  local  data  respectively.  In  this  way,  the  most 


To  bridge  the  gap  between  different  data  distributions 
on  machine  learning,  transfer  learning  techniques  have 
been  successfully  developed  and  widely  used  in  the 
applications30. Specifically,  the  domain  adaptation  method 
in  transfer  learning  has  been  receiving  increasing  attention 
in  the  fault  diagnosis  studies,  since  the  machinery  health 
condition  label  spaces  are  usually  identical.  In  general,  the 
domain  adaptation  approaches  aim  to  learn  domain- 
invariant  features  from  different  conditions,  that  facilitates 
the  fault  diagnostic  knowledge  generalize  in  different  cases 
[31]. In  this  paper,  the  maximum  mean  discrepancy  (MMD) 
metric  is  adopted,  which  measures  the  distance  between  the 
distributions  of  source  and  target  domains.  The 
optimization  of  MMD  is  able  to  achieve  domain  fusion  in 
the  high-level  representation  sub-space  in  deep  neural 
networks,  and  thus  extract  generalized  features  for 
diagnosis  15. The  MMD  metric  is  defined  as  the  squared 
distance  between  the  kernel  embeddings  of  data  marginal 
distributions  in  the  reproducing  kernel  Hilbert  space 


significant  features  can  be  extracted  and  the  data 
dimensions  can  be  reduced,  which  increases  the  computing 
efficiency  of  deep  learning.  By  exploiting  the 
convolutional  and  pooling  operations,  the  high-level 
features  from  raw  data  can  be  obtained,  and  they  can  be 
used  for  the  final  task  afterwards,  i.e.  machinery  fault 
diagnosis.  Readers  are  referred  to  [7,29]  for  more 
descriptions  of  CNN. 

B.  Domain  Adaptation 


(RKHS)  as 

MMDt(P,Q)  =  ||EP  [tf>(xs)]  -  E Q  [<p  (xf)]  ||^  ,  (1) 

Where  Hk  denotes  the  RKHS  endowed  with  the 
characteristic  kernel  k.  Based  on  the  current  understanding 
of  MMD  [32],  kernel  choice  is  one  of  the  key  factors  in 
domain  adaptation,  since  different  kernels  can  embed  the 
probability  distributions  indifferent  RKHSs  and  different 
orders  of  the  statistics  are  explored.  Therefore,  multiple 
kernels  in  MMD  are  employed  in  this  paper  to  leverage 
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different  kernels  and  achieve  improved  performance.  In  the 
implementations,  Nk  RBF  kernels  are  used  as  [33], 

Nk 

fc(x*,x')  =  ^fc(7i(xs,x(),  (2) 

7  —  1 

Where  kai  denotes  a  Gaussian  kernel  with  bandwidth 
coefficient  ai.  In  this  study,  three  kernels  are  adopted,  and 
the  bandwidth  parameters  are  selected  as  2,  4  and  8 

III.  PROPOSED  FAULT  DIAGNOSIS  METHOD 

The  proposed  method  is  described  in  Figure  1  and 
consists  of  four  individual  steps.  In  each  step,  the  key 
functionalities  are  presented  and  discussed  in  detail 

Training  Samples 


Fig.  2:  data  augmentation  with  overlap. 


A.  Data  Partitioning 

In  the  first  phase,  the  raw  time-domain  sensor  data 
collected  from  a  gearbox  is  partitioned  into  two  sets  (a) 
source  domain  data  (labeled  data)  and  (b)  target  domain 
data  (unlabeled  data). The  target  domain  data  is  also  further 
partitioned  into  training  and  testing  sets,  where  one  of  the 
unlabeled  subset  is  used  in  training  the  CNN  model  and  the 
other  subset  is  used  for  testing  the  trained  model. 

B.  Data  Modeling 

There  are  two  major  steps  for  modeling  the  data  prior  to 
training  the  diagnosis  model,  which  are  presented  as 
follows.  1)  Data  augmentation  In  order  to  increase  the 
number  of  training  samples,  a  windowing  method  has  been 
used.  As  depicted  in  Figure  2, a  window  with  a  fixed 
sample  size  moves  over  a  time  series  signal  and  generates 
multiple  samples.  For  example,  a  signal  with  1000,000 
points  can  provide  the  191  training  samples  with  length 
50,000  when  the  shift  size  is  5000  points. 2)  Fast  Fourier 
Transform  (FFT)In  order  to  eliminate  the  impact  of  the 
supply  line  frequency,  the  FFT  technique  is  applied  to  each 
sample  generated  fromthe  augmentation  process.  It  is 
expected  that  fault  signatures  appear  as  sidebands  around 
the  supply  line  frequency  (or  running  frequency)  in  the 
FFT  spectrum  [34] .  All  samples  after  FFT  are  directly  used 


in  the  deep  learning  model  for  feature  learning  and  fault 
diagnosis. 

C.  Deep  Learning  Model  Formulation 

For  the  network  optimization,  two  terms  are  generally 
included  in  the  objective,  i.e.  source-domain  classification 
loss  and  domain  discrepancy  loss.  First,  following  the 
typical  machine  learning  paradigm,  the  empirical  health 
condition  identification  errors  on  the  source  domain  are 
supposed  to  be  minimized,  and  the  cross-entropy  loss 
function  Ls  is  adopted  in  this  study,  which  is  defined  as, 

l  Us  Nc  j 

^=tEE  1  {yi  =  j}  log  Af  ’  a.  .  (3) 

Us  ;=1  j= 1  Lm= 1  e 

Where  ns  denotes  the  number  of  the  source-domain 
training  samples,  xsi,  jis  the  jth  element  of  network  output 
vector,  taking  as  input  the  ith  labeled  source-domain 
sample,  and  yiis  the  label  of  the  ith  source-domain  sample. 
Nc  represents  the  number  of  the  concerned  machinery 
health  conditions.  Besides  the  basic  supervised  learning 
part,  the  source  and  target  domain  discrepancy  should  be 
minimized,  and  the  MMD  metric  is  adopted  to  measure 
and  optimize  the  domain  gap  in  this  study  as  described  in 
Section  II-B.  Specifically,  the  MMD  loss  Ld  is  defined  as, 

minLd  =  MMDk(P5,^r),  (4) 

where  PS  and  PT  denote  the  distributions  of  the  high- 
level  representations  of  the  source  and  target-domain  data 
respectively  in  the  last  fully-connected  layer  of  the 
network.  In  summary,  the  losses  in  Equations  (3)  and  (4) 
can  be  combined,  and  the  final  optimization  objective  Lopt 
can  be  expressed  as, 

min  Lopt  =  Ls+Ld,  (5) 

the  unlabeled  testing  target-domain  data  are  used  for 
fault  diagnosis  and  performance  of  the  proposed  method  is 
reported. 


Fig.  3:  The  experimental  setup  of  the  test  rig  [35] 
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IV.  EXPERIMENTAL  STUDYA. 

A.  Test  Rig 

A  validation  study  has  been  conducted  on  a  dataset 
acquired  from  a  gearbox  prognostic  simulator  (GPS)  built 
by  the  Spectra  Quest  Company35,  as  is  shown  Figure  3. 
Two  confronted  electrical  motors  are  used  in  the  test  rig; 
one  motor  is  used  for  drive  and  the  other  one  for 
resistance/load.  Both  motors  are  three-phase  induction 
motors  with  10  Hp  and  two  pair  of  poles.  A  current  sensor 
(HTA  100)  was  installed  on  the  drive  motor  and  was  used 
in  our  analysis  for  fault  diagnosis.  The  data  was  recorded 
using  a  computer  with  a  National  Instruments  acquisition 
card  (NI  4472  series)  at  a  sampling  rate  of  50ks/sec.  The 
monitored  gearbox  is  composed  of  four  spur  gears(Figure 
4).  The  first  gear,  as  it  comes  from  the  motor  that  drives 
the  test  bench,  has  32  teeth.  It  is  the  one  substituted  by 
gears  in  different  health  states,  leaving  the  rest  unchanged. 
It  is  followed  by  a  gear  with  80  teeth.  In  the  same  axle,  a 
gear  with  48  teeth  is  found,  connected  to  a  gear  with  64 
teeth,  resulting  in  a  global  transmission  relationship  of 
3.33. 

In  this  study,  the  torque  load  applied  to  the  gearbox  was 
gradually  increased  by  40%,  80%,  and  100%.  In  each  load, 
the  operational  speed  was  kept  constant  at  1500  rpm  and 
each  run  was  repeated  15  times  to  reduce  the  impact  of 
randomness  and  uncertainties.  Table  I  summarizes  the 
experimental  studies  and  a  comparison  between  raw  motor 
current  measurements  and  the  corresponding  FFT  spectra 
for  different  loads  and  in  healthy  condition  is  given  in 
Figure  5.  Accordingly,  by  increasing  the  load  condition, 
amplitude  of  raw  current  signal  and  FFT  spectrum  increase 
significantly.  Figure  6  shows  the  five  health  conditions 
examined  in  this  paper  and  the  impact  of  FFT  analysis  in 
distinguishing  different  faults  at  0%  load  is  given  in  Figure 
7.  As  shown,  raw  motor  current  measurements  do  not  show 
significant  differences  between  different  health  conditions, 
however,  they  are  clearly  distinguishable  from  the  FFT 
spectrum.  The  proposed  method  is  tested  for  six  transfer 
tasks,  i.e 

A  summary  of  data  segmentation  for  different  tasks  is 
given  in  Table  II.  Nsource  and  Ntarget  represent  the 
number  of  samples  from  each  class  of  source  and  target 
domain  datasets  respectively.  All  experiments  are 
performed  on  a  PC  with  16-GB  RAM,  Core  i5  CPU,  and 
NVIDIA  GeForce  TX  2080  Ti.  The  programming  is  done 
in  Tensorflow  and  GPU  computing  is  used  to  reduce  the 
model  training  time. 


Fig.  4:  Illustration  of  the  gear  disposition  inside  the 
experimental  gearbox. 


Healthy  data  at  different  loads 
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Fig.  5:  Comparison  between  healthy  data  at  different 
load  conditions. 

B.  Model  Architecture  Design 

As  shown  in  Figure  8,  the  first  step  is  to  design  a  CNN 
architecture  and  tune  the  network  parameters.  In  this  study, 
a  stack  of  four  convolutional  and  pooling  layers  and  a  max¬ 
pooling  layer  are  used  for  model  training.  The  impact  of 
filter  size  (Fs)  and  filter  number  (Nf)  on  the  cross  domain 
diagnosis  performance  and  for  task  Tl-2is  shown  in 
Figure  9.  Generally,  a  larger  value  for  Nf  and  Fs  leads  to  a 
higher  diagnosis  accuracy,  but  this  improvement  by  larger 
values  is  relatively  limited.  Moreover,  by  increasing  Nf 
and  Fs,  the  training  time  increases  significantly.  Therefore, 
Nf=Fs=  20was  selected  for  the  final  model.  Batch  size 
(Nb)  is  another  tuning  parameter  that  may  significantly 
affect  the  diagnosis  accuracy.  For  our  dataset,  selection  of 
low  batch  size  leads  to  the  worst  diagnosis  results  and  a  too 
large  batch  size  would  create  a  big  cumulative  descent  in 
updating  the  parameters  especially  when  MMD  loss  is 
integrated  in  the  model  and  therefore  the  prediction 
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accuracy  drops  for  too  large  batch  sizes.  Therefore,  it  is 
important  to  choose  a  reasonable  tradeoff  value  for  Nb. 
Consequently,  Nb=  64  was  selected  for  the  final  diagnosis 
model.  The  confusion  matrix  corresponding  to  the  final 
diagnosis  results  in  task  T 1-2  is  illustrated  in  Figure  10.  It 
is  observed  that  only  two  classes  ‘eccentricity’  and 
‘missing  tooth’  are  slightly  misclassified  and  all  other 
classes  are  precisely  classified. 

Table  I:  Experimental  details. 


Experiment  Number 

Load 

Speed  (rpm) 

i 

0% 

1500 

2 

40% 

1500 

3 

80% 

1500 

4 

100% 

1500 

C.  Results  and  comparison 

In  this  section,  different  implementations  are  used  to 
evaluate  the  performance  of  the  proposed  method  and 
comparison  with  the  latest  related  works  is  also  presented. 

1)  Effects  of  training  sample  size 

Performance  of  the  final  model  in  different  tasks,  i.e.Tl-2, 
Tl-3,  Tl-4,  T4-3,  T4-2and  T4-land  for  different  source 
domain  sample  size,  N  source,  is  illustrated  in  Figurell.  In 
this  study  the  number  of  target  samples,  N  target,  is  kept 
constant  at  300.  With  increasing  N  source,  the  testing 
accuracy  increases  and  prediction  uncertainty  (measured 
by  the  standard  deviation)  reduces  significantly.  The 
proposed  CNN-based  domain  adaptation  method  provides 
acceptable  testing  accuracy  even  with  small  training  source 
samples,  N  source.  As  presented  in  Figure  11,  the  achieved 
testing  accuracy  in  some  tasks  like  Tl-2and  T4-3is  higher 
than  other  tasks.  This  observation  is  due  to  the  nature  of 
data  and  the  similarity  between  the  distribution  of  source 
and  target  domain.  For  instance,  the  load  variation  from 
experiment  #1  to  experiment  #2  is  40%  which  is  smaller 
than  that  between  experiment#  1  and  experiment  #4  (i.e. 
100%).  Therefore,  the  transfer  of  learned  features  from 
experiment  #1  to  experiment  #2  is  easier.  In  addition, 
achieving  the  high  accuracies  in  different  tasks  from  low  to 
high  operational  loads  and  vice  versa  indicates  that  the 
proposed  method  performs  well  bidirectional  between 
different  domains.  The  achieved  results  for  different  tasks 
also  clearly  illustrate  the  effectiveness  of  the  motor  current 
measurement  signal  for  cross-domain  fault  diagnosis.  As 
presented,  by  increasing  the  number  of  training  samples, 
the  diagnosis  performance  improves  as  well  which  follows 
the  same  pattern  as  the  classical  fault  diagnosis  methods. 


Fig.  6:  Different  health  conditions  examined  in  this  paper. 

2)  Classification  Results  and  Comparison 

Performance  of  the  proposed  transfer  learning 
methodology  is  compared  with  two  groups  of  fault 
diagnosis  tools  as  summarized  below: 

Group  A-  Supervised  classification  methods  such  as: 

1)  LDA  [36]-  Linear  Discriminant  Analysis  is  a  supervised 
algorithm  that  uses  a  linear  transformation  matrix  to 
project  features  from  parametric  space  to  feature  space. 

2)  SVM  [37]-  Support  Vector  Machines  are  supervised  ma¬ 
chine  learning  algorithms  that  can  be  employed  for  both 
regression  and  classification  problems.  SVMs  are  designed 
based  on  Structural  Risk  Minimization  criteria  in  the 
statistical  learning  theory.  SVMs  work  on  a  simple  idea:  to 
identify  a  hyper-plane  which  separates  the  training  data 
into  two  distinct  classes. 

3)  CNN  Without  Domain  Adaptation  (No-DA)  -  A  deep 
learning  method  that  automatically  extracts  features  from 
the  raw  signal  measurement.  A  typical  classification  is 
obtained  by  only  considering  the  classification  loss  in 
Equation  (5).  This  trained  model  is  directly  used  for  testing 
on  the  target  dataset. 
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All  classes  at  0%  load 


33  3.51  3  52  3  53  3.54  3.56  3.56  357 

_ Time  (s) _ 

— Healthy  — Pitting  — Missing  tooth  Ecentreccity  — Chipped] 


Fig.  7:  Different  health  conditions  indicated  in  raw  time 
domain  and  frequency  spectrum. 


Fig.  8:  The  proposed  deep  neural  network  architecture 


Group  B-  Transfer  learning  methods  including: 

1)  TCA  [38]-  Transfer  Component  Analysis  is  used  to  find 
a  feature  subspace  in  the  domain  adaptation  field.  In  the 
subspace  created  by  transfer  components,  the  source  and 
target  data  distribution  are  similar.  When  the  subspace  is 
created,  a  SVM  classifier  is  trained  with  the  labeled  source 
domain  dataset  and  acquire  the  accuracy  of  the  target 
domain. 

2)  JDA  [39]-  Joint  Distribution  Adaptation  is  a 
modification  of  TCA.  It  is  able  to  simultaneously  adapt  the 
conditional  and  marginal  distributions  during  the 
dimensionality  reduction  process. 

3)  GFK  [40]-  Geodesic  Flow  Kernel  is  an  unsupervised  do¬ 
main  adaptation  technique  wherein  the  source  and  target 
domain  data  are  projected  into  a  linear  subspace  while  the 
shortest  line  path  connects  the  two  original  domains. 

4)  BDA  [41]-  Balanced  Distribution  Adaptation  aims  to 
automatically  balance  the  significance  of  marginal  and 
conditional  distribution  discrepancies  and  therefore  it  can 
effectively  adjust  for  a  specific  transfer  task. 


5)  T-S  [42-]  This  method  suggests  performing  adaptation 
by  learning  a  target- specific  network  from  the  source- 
specific  network. 

Table  II:  Data  segmentation  for  transfer  tasks. 


Transfer  Task 

Source  Sample  Number 

Target  Sample  Number 

Tl-2 

5  X  N source 

5  X  N tar  get 

Tl-3 

5  X  N source 

5  X  N tar get 

T 1  —4 

5  X  N source 

5  X  N target 

T4-3 

5  X  Nsource 

5  X  Nfarget 

T4-2 

5  X  -/V source 

5  X  N tar  get 

T4-1 

5  X  Nsource 

5  X  N tar get 

In  Group  A,  three  classification  methods  are  used  to 
learner  presentative  features  from  the  training  source  data 
in  a  supervised  process  and  then  the  trained  classifier  is 
used  on  the  target  domain  data  for  testing  and  the  achieved 
results  are  reported.  Hand-crafted  time  and  frequency 
domain  features  such  as  standard  deviation,  mean,  peak  to 
peak,  kurtosis,  frequency  amplitude  and  energy  etc.  are 
used  as  an  input  to  LDA  and  SVM  methods.  For  No-DA, 
raw  frequency- 


Tl-2 


Fig.  9:  Impact  of  filter  size  and  filter  number  on  the  testing 
accuracy  for  task  Tl-2. 


domain  data  is  utilized.  Because  these  methods  inherently 
do  not  consider  domain  variation  between  the  source  and 
target  datasets,  therefore  a  low  classification  performance 
is  highly  expected.  In  Group  B,  the  extracted  time  and 
frequency  features  are  used  for  domain  adaption  tasks  and 
the  achieved  results  are  compared  with  the  proposed 
method.  Analyses  are  conducted  on  300  samples  obtained 
from  the  source  and  target  dataset  and  the  obtained 
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diagnosis  results  on  the  testing  (target  domain)  data  are 
visualized  in  Figure  12.  In  contrast  with  other  methods,  the 
proposed  approach  provides  the  highest  accuracies  in  all 
six  transfer  tasks,  and  basically,  the  accuracies  are  higher 
than  91%,  which  illustrates  the  effectiveness  of  the 
proposed  transfer  learning  approach.  The  average 
performance  improvement  for  the  proposed  method  is 
57.46%,  55.68%,  39.3%,  36.62%,  35.87%,  34.5%, 
26.67%, 2.75%  compared  with  LDA,  SVM,  GFK,  JDA, 
TCA,  BDA,  No-DA,  and  T-S.  The  second-best 
performance  is  obtained  from  T-S  and  No-DA  is  ranked  in 
the  third  place.  Overall,  domain  adaptation  methods 
discussed  in  Group  B  outperform  the  classical  diagnosis 
methods  in  group  A  but  they  are  not  as  promising  as  the 
proposed  method. 


Chipped  tooth 
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O 


*5  Missing  tooth 
O 
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0 

33 

0 
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0.0% 
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0.0% 
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0.0% 

10.3% 

0 

0 
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0 

0 
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0.0% 

0.0% 

20.0% 

0.0% 

0.0% 

0.0% 

0 

12 

0 

267 

0 

95.7% 

0.0% 

0.8% 

0.0% 

17.8% 

0.0% 

4.3% 

0 

0 

0 

0 
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0.0% 

0.0% 

0.0% 

0.0% 

20.0% 

0.0% 

100% 

96.0% 

100% 

89.0% 

100% 

97.0% 

0.0% 

4.0% 

0.0% 

11.0% 

0.0% 

3.0% 

Fig.  10:  The  confusion  matrix  for  the  classification  results 
in  task  T 7-2. 


The  performance  of  different  diagnosis  methods  for  the 
low  number  of  training  samples  e.g.  Nsource=  60  and 
Ntarget=  300,  is  illustrated  in  Figure  13.  As  expected, 
using  low  number  of  labeled  data  for  training  deteriorates 
the  testing  diagnosis  accuracy  for  all  evaluated  methods. 
This  observation  is  consistent  with  the  previous  studies 
conducted  on  deep  learning  methods  that  larger  training 
data  leads  to  a  better  diagnosis  performance  and  transfer 
learning  based  diagnosis  methods  also  follow  this  pattern. 
Moreover,  comparing  the  results  obtained  from  methods  in 
Group  A  (without  domain  adaptation)  with  the  diagnosis 
results  obtained  from  methods  in  Group  B  and  the 
proposed  method,  shows  the  significant  impact  of  cross¬ 
domain  adaptation  on  fault  diagnosis  performance.  T-S 


which  provides  an  alternative  way  for  domain  adaptation, 
shows  good  performance  with  large  training  sample  size. 
However,  with  a  low  sample  size,  its  performance 
deteriorates  significantly  because  this  method  minimizes 
the  distribution  discrepancy  between  the  target  dataset  and 
the  learned  representations  from  the  source  training 
network.  The  achieved  results  illustrate  the  effectiveness  of 
motor  current  signal  for  cross-domain  fault  diagnosis. 


70  75  80  85  90 

Accuracy  (%) 


100 


Fig.  11:  Performance  of  the  proposed  method  at  different 
tasks  and  for  different  training  sample  size. 


3)  Visualization  of  the  learned  features 

In  order  to  illustrate  the  effectiveness  of  our  approach,  T- 
distributed  Stochastic  Neighbor  Embedding  (t-SNE) 
technique  is  adapted  in  visualizing  the  high-level  feature 
representation  by  mapping  them  from  the  original  feature 
space  into  a  2-Dspace  map.  The  visualization  is  performed 
on  task  Tl-2for  the  proposed  method  and  also  for  CNN 
without  domain  adaptation  (No-DA  method). 
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Fig.  12:  The  achieved  testing  accuracy  for  different 
comparative  methods  and  in  all  six  transfer  tasks. 


Nsource- 60 


Transfer  Task 


Fig.  13:  Fault  diagnosis  results  with  the  low  number  of 
source  domain  samples  in  all  six  transfer  tasks. 


accurate  cross-domain  fault  diagnosis.  As  illustrated,  the 
cross-domain  invariant  features  obtained  by  the  proposed 
method  are  clustered  well  where  features  from  different 
classes  are  separated  clearly  and  only  a  small  amount  of 
overlapping  is  observed  between  classes  ‘Eccentricity’ 
and4  Missing  tooth’  faults  in  the  source  and  target  domains. 


Before  Domain  Adaptation 


After  Domain  Adaptation 


Figure  14  illustrates  the  virtualization  of  learned  features  in 
the  fully  connected  layer  of  the  source  domain  classifier 
without  domain  adaptation.  As  observed,  without  domain 
adaptation,  samples  from  each  identical  class  in  the  source 
or  target  data  cluster  together.  However,  for  some  labels 
there  is  a  notable  distribution  discrepancy  between  the 
source  and  target  domain  samples.  Since  the  feature  space 
is  divided  into  several  regions  associated  with  different 
labels,  it  is  expected  to  obtain  a  low  diagnosis  performance 
in  the  target  domain  data.  Therefore,  it  is  necessary  to 
bridge  the  distribution  discrepancy  between  the  source  and 
target  data  to  improve  classification  results  on  the  target 
data.  By  using  domain  adaptation,  as  is  shown  in  Figure 
14,  the  source  and  target  domain  features  are  projected  into 
the  same  region  as  the  model  is  trained.  Accordingly,  the 
distribution  discrepancy  has  reduced  significantly  between 
the  source  and  target  domains  and  samples  from  different 
conditions  are  separated  clearly.  These  two  requirements  a) 
minimal  distribution  discrepancy  between  two  domains 
and  b)  clear  differentiation  between  different  health 
conditions  in  both  domains  would  guarantee  achieving  an 


Fig.  14:  Extracted  features  in  the  fully  connected  layer  are 
visualized  for  task  71-2  at  Nsource  =  300.  Both  the 
scenarios  before  and  after  domain  adaptation  are 
presented. 

V.  CONCLUSION 

In  this  paper,  a  deep  learning-based  domain  adaptation 
fault  diagnostic  method  for  gearboxes  is  proposed.  An  end- 
to-end  diagnostic  model  is  established,  which  takes  the  raw 
motor  current  data  as  inputs,  and  directly  outputs  the 
predicted  health  conditions.  The  maximum  mean 
discrepancy  metric  is  used  to  bridge  the  distribution  gap 
between  different  gearbox  operating  conditions. 
Experiments  on  a  real-world  gearbox  condition  monitoring 
dataset  are  carried  out  for  validations,  and  promising  cross¬ 
domain  fault  diagnosis  performance  is  achieved  by  the 
proposed  domain  adaptation  method.  This  study  offers  a 
new  perspective  on  enhancing  fault  diagnosis  model 
generalization  ability  in  different  operating  scenarios  of 
gearboxes.  The  high  data  requirement  of  vibration  signals 
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by  most  existing  methods  is  also  alleviated,  and  effective 
diagnostic  performance  can  be  obtained  using  only  the 
easily-collected  current  data.  However,  it  should  be  pointed 
out  that  the  main  limitation  of  this  study  lies  in  the 
assumption  of  the  target-domain  data  during  training. 
Further  research  works  will  be  carried  out  on  developing 
robust  fault  diagnosis  models  for  different  scenarios 
without  the  availability  of  the  target-domain  data  in 
advance. 
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